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The development of mechanised language specification based on structured operational semantics, with 
applications to verified compilers and sound program analysis, requires huge effort. General theory and 
frameworks have been proposed to help with this effort. However, none of this work provides a systematic way 
of developing concrete and abstract semantics, connected together by a general consistency result. We introduce 
a skeletal semantics of a language, where each skeleton describes the complete semantic behaviour of a language 
construct. We define a general notion of interpretation, which provides a systematic and language-independent 
way of deriving semantic judgements from the skeletal semantics. We explore four generic interpretations: a 
simple well-formedness interpretation; a concrete interpretation; an abstract interpretation; and a constraint 
generator for flow-sensitive analysis. We prove general consistency results between interpretations, depending 
only on simple language-dependent lemmas. We illustrate our ideas using a simple While language. 
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Additional Key Words and Phrases: programming language, semantics, abstract interpretation 

ACM Reference Format: 

Martin Bodin, Philippa Gardner, Thomas Jensen, and Alan Schmitt. 2019. Skeletal Semantics and Their 
Interpretations. Proc. ACM Program. Lang. 3, POPL, Article 44 (January 2019), 31 pages, https://doi.org/10. 
1145/3290357 

1 INTRODUCTION 

Plotkin’s Structural Operational Semantics [Plotkin 1981] provides a methodology for formally 
describing a programming language using a collection of inference rules. It has been widely used to 
provide, for example, mechanised language specifications of substantial parts of ML [Owens 2008], 
C [Blazy and Leroy 2009; Norrish 1998] and JavaScript [Bodin et al. 2014], These specifications have, 
in turn, been used to build verified compilers [Kumar et al. 2014; Leroy 2006] and to develop sound 
program analysis [Cachera et al. 2005; Jourdan et al. 2015; Klein and Nipkow 2002], Such language 
specifications and their applications require huge effort, stretching the fundamental theory and 
tools to their limits. Researchers have therefore spent considerable thought developing general 
theories and frameworks where some of this effort can be unified for a wide class of languages. 

Abstract interpretation [Cousot and Cousot 1977] is a well-known general theory for analysing 
programs. It provides general definitions for describing when an abstract semantics is consistent 
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(sound) with respect to a concrete semantics, and even suggests a methodology for how to construct 
consistent abstract semantics from concrete semantics [Cousot 1999; Midtgaard and Jensen 2008; 
Van Horn and Might 2011]. We focus on abstract semantics arising from concrete operational 
semantics. A prominent example can be found in the Verasco project [Jourdan et al. 2015] which 
provides a Coq-certified static analyser based on abstract interpretation, specifically targeting Com- 
pCert’s mechanised C specification. Schmidt [Schmidt 1995, 1997a] has demonstrated how to build 
abstract derivations from concrete derivations arising from an operational semantics, illustrating 
a close connection between the abstract and concrete semantics. The concepts are general, but 
the work does not attempt to be systematic. Inspired by Schmidt, Bodin et al. [Bodin et al. 2015] 
have identified a general rule format that can be systematically instantiated to both concrete and 
abstract semantics, with a general consistency result. However, their general rule format is based 
on a non-standard style of operational semantics, called pretty-big-step operational semantics [Char- 
gueraud 2013], introduced to provide a Coq-mechanised specification of JavaScript [Bodin et al. 
2014]. It does not provide a general systematic approach for constructing an abstract semantics 
from a standard operational semantics. 

A general framework provides a unifying meta-language for writing operational inference rules, 
in order to develop general environments for analysis [Harper et al. 1987; Jung et al. 2017; Pfenning 
and Schiirmann 1999; Ro§u and §erbanu[a 2010]. Much of the work on frameworks does not aim 
to describe abstract analysis. One notable exception is the Iris framework [Jung et al. 2017] for 
reasoning about concurrent programs. Iris provides a systematic method for building a concurrent 
program logic from concrete operational semantics, proving a general consistency result. It starts 
from a concrete operational semantics and generically builds the program logic. Consequently, 
the general consistency result relies on language-dependent lemmas which require an induction 
over the possibly complex constructs of the language. It does not work with abstract semantics in 
general, and the lemmas associated with the general consistency result are difficult to prove. 

We introduce a new approach. We have developed a meta-language, which we call a skeletal 
semantics, from which it is possible to construct systematically both concrete and abstract semantics, 
and prove a general consistency result. Our skeletal semantics comprises: 

• skeletons, where each skeleton describes the complete behaviour of one language construct; 

• generic interpretations, which systematically derive semantic judgements from the skeletons: 
for example, a generic concrete interpretation built using the usual concrete judgements of 
an operational semantics, parameterised by an input state, command and output state, and a 
generic abstract interpretation built from more abstract judgements over abstract domains; 

• a general consistency result between interpretations, which depends on simple language- 
dependent lemmas. 

Our definitions of skeletal semantics and interpretations have been mechanised in the Coq theorem 
prover, and the consistency result proved. 

Skeletal semantics can be used to describe languages specified using big-step operational se¬ 
mantics and languages specified using an English standard such as the ECMAScript standard. In 
this introductory paper, we focus on a simple While language as the illustrative example; the 
lambda calculus is given in the Coq artefact. Consider the usual if command of a While language, 
whose behaviour is typically defined in an operational semantics using two standard rules for 
the true and false case. Instead, in our skeletal semantics, the behaviour of the if command is 
given by one skeleton comprising: a semantic judgement, in this case parameterised by input state, 
expression and value, and instantiated via the interpretations with, for example, the usual concrete 
and abstract judgements for evaluating expressions; then a branch of two paths guarded by filters 
for determining the true and false case, followed by judgements for the appropriate subcommands. 
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<x, e || true tx, jj. x a 
<x, if e ti t 2 x a 


(X, e jj false <x, t 2 j| x„ 
a, if e fi t 2 x 0 


Fig. 1. Usual concrete rules for the if construct 


Our if skeleton thus describes the information given in the two normal if rules, collected together 
under one syntactic construct. 

Skeletons provide all the information necessary to give systematically both concrete and ab¬ 
stract interpretations. Intuitively, our generic concrete interpretation picks one path from each 
branching/merging of the skeleton, whereas our generic abstract interpretation merges all the 
appropriate paths. In fact, our interpretations span many different types of analysis. The paper 
contains a simple well-formedness interpretation for simple sorts, suggesting that we can give 
many forms of standard well-formedness result associated with states and types. We also give an 
interpretation building a constraint generator for flow-sensitive analysis. We discuss other forms 
of analyses in future work. 

We have proved general consistency results between interpretations, which depend on simple 
language-dependent filter lemmas. These filter lemmas only describe properties of the filters of 
a language, which are functions on the language values. The complexity of proving these filter 
lemmas thus only depends on the complexity the filters, which are simple in comparison with 
the complexity of the whole language. We explore the instantiation of our consistency result for 
our While language, demonstrating the consistency of the abstract interpretation with respect 
to the concrete interpretation for a selection of domains, as well as the consistency between the 
constraint generation and the abstract interpretation. 

In summary, we have come a long way to answering the challenge of developing a language- 
independent framework for relating concrete and abstract semantics. The real test will come when 
we move from the simple languages explored in this paper to real-world languages such as OCaml 
and JavaScript, discussed in the future work. 

1.1 Example: the While Language 

We demonstrate our skeletal semantics in action using the simple conditional statement from the 
While language. Consider the usual concrete rules associated with the conditional statement in 
Figure 1, and the abstract rules in Figure 2, supposing that the Booleans are abstracted by the usual 
four-valued lattice given by {true*, false*, Tj 00 ;, _L{, 00 /}. These abstract rules are intuitively correct, 
but they are first built in an ad hoc way and then shown to be related to the concrete rules using a 
Galois connection. More generally, the systematic construction of abstract rules from concrete rules 
requires a deep understanding of how the analysed programming language evaluates expressions: 
in a case like a vanilla While language, this is quite straightforward; for a complex language such 
as JavaScript [Bodin et al. 2014; ECMA 2018; Maffeis et al. 2008], the relationship between the 
concrete and abstract semantics can be difficult to get right. 

We define the skeletal semantics in Section 2, which provides a general meta-theory for defining 
language semantics. Figure 3 shows the skeleton associated with the if construct, with generic 
subterms denoted by x tl , x t2 , and x f3 , input state x a and output state x D . Judgements of the form 
H(-, x ti , -) identify the required sub computations associated with the subterms x ti ,x t2 ,x t3 . The 
skeleton stitches these judgements together, using the input and output states, the internal symbolic 
variable Xj \, and the branching which identifies paths through the skeleton using the filters isTrue 
and isFalse, resulting in the output state x 0 . Such a skeleton thus explicitly describes both the data 


Proc. ACM Program. Lang., Vol. 3, No. POPL, Article 44. Publication date: January 2019. 




44:4 


Martin Bodin, Philippa Gardner, Thomas Jensen, and Alan Schmitt 


<x, e JJ. true* er, t\ JJ, x„ 


a, e JJ, false* <x, t 2 (J x a 


a, if e fi t 2 IJ. x 0 


a, if e ^ t 2 IJ x 0 


er. e JJ T b 00 i er, t\ |J x a er, t 2 D x a 


er, e JJ L bool 


a, if e fi t 2 IJ x Q 


er, ifet 1 t 2 \l± 


Fig. 2. Usual abstract rules for the if construct 


If (if x tl x t2 x h ) := H (x a , x h , x fl ) 



Fig. 3. Skeleton for the if construct 


flow and the control flow associated with a language construct, identifying the common pattern 
underlying the concrete and abstract rules. 

We provide a general definition of interpretation for our skeletal semantics in Section 3 and 
study four generic interpretations: 


• A simple well-formedness interpretation (Section 3), which states that the stitching of the 


skeleton in Figure 3 respects the sorting of the basic constructs. 

• The concrete interpretation (Section 4), which intuitively picks one path from each branching 
of the skeleton, corresponding to the two rules of Figure 1. 

• The abstract interpretation (Section 5), whose complex definition (Figure 10) boils down to the 
intuitive description given by the rule of Figure 4: a rule with optional branches, considering 
all paths compatible with the return value of the expression e. This rule naturally subsumes 
the four rules of Figure 2. 

• A constraint generator for flow-sensitive static analysis (Section 7). Although these constraints 
are different in nature to the abstract semantics, they are expressed in our meta-theory using 
the same mechanism: that is, an interpretation of the skeletal semantics. This provides a 
strong connection between them. 

We also provide general definitions of consistency between interpretations (Section 3.2), with 
general consistency proofs based on filter lemmas (Section 3.3). The shared structure of our different 
interpretations greatly eases the proof process. We use our consistency definitions to show that the 
abstract interpretation is correct with respect to the concrete interpretation, and that any solution 
to the constraints given by our constraint generator must give rise to a correct abstract semantics. 

Throughout the paper, we instantiate our definitions and results to the While language as a 
way of introducing our ideas and demonstrating how classic proof techniques based on an abstract 
interpretation of While can be captured with our approach (Section 6). We however emphasise that 
skeletons and interpretations, as well as their consistency proofs, are generic and can be applied 
to any programming language. To begin to illustrate this, we extend our While language with 
exceptions, input/output and a heap in Section 8. 

The definitions and proofs of Sections 2 to 5 have been formalised in Coq; those of Sections 6 
and 7 have been proven on paper. They are all available from the companion website 1 . 

1 http://skeletons.inria.fr 
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cr # , e||u # ([isTruej # (v*) => a*, h a*) ([isFalse]* (v*) => a*, t 2 JJ. a*) 

a*, if eh t 2 f t* 

Fig. 4. The abstract interpretation of the if construct: intuitive description. 


c 

Signature 

const 

var 

+ 

— 1 

lit —» expr 
ident —> expr 
(expr X expr) —» expr 
(expr X expr ) —» expr 
expr —> expr 


c 

Signature 

skip 

stat 


(ident X expr) —> stat 

; 

(stat X stat) —» stat 

if 

(expr X stat X stat) —> stat 

while 

(expr X stat) —» stat 


Fig. 5. Constructors for While 


2 SKELETAL SEMANTICS 

2.1 Terms 

Terms t of a skeletal semantics are built using base terms, term variables, and constructors. Base 
terms are left unspecified and correspond to the basic blocks of the syntax, such as literals or 
program identifiers. They are instantiated by interpretations. We assume a countable set of term 
variables, ranged over by x t , and a finite set of constructors, ranged over by c. A term is thus a base 
term, a term variable, or a constructor applied to terms. 

We also assume a countable set of sorts, ranged over by s. The sorts are separated into base sorts, 
for base terms, and program sorts, for terms built using constructors. Any base term belongs to a 
single base sort. The signature of a constructor c, written sig(c), is of the form (si..s„) —> s, where 
n is the arity of c, the S; for i = l..n are sorts, and s is a program sort. 

Running Example. For the While language, the base sorts are ident for the program variables 
and lit for the literals. Program sorts are expr for expressions and stat for statements. The signature 
of constructors is given in Figure 5. 

Let T be a mapping from term variables to sorts. Sorted terms are either base terms, term variables 
x t of sort T(x t ), or a term c{t\..t n ) of sort s, where c has signature sig(c) = (si..s n ) —> s and the 
terms t\..t n have the appropriate sort. We write Sortr(t) for the sort of t. Let £ be a mapping from 
term variables to terms such that Wx t e dom(E), Sorfr(£(x t )) = T(x t ). We extend it to terms as 
£(c(ti..t„)) = c(E(h)..E(t n )) when defined. We write Sort(t) for Sort^f) and Tvar(t) for the set of 
term variables in t. We say t is closed if Tvar(t ) = 0. In that case, we write t : s for Sort(t) = s. 

Lemma 2.1. Let t a term, E an environment mapping term variables to closed terms, and T a sorting 
environment such that Tvar(t) C dom(E), Tvar(t) C dom(T), and for any x t 6 Tvar(t) we have 
r(x f ) = Sort(E(xf)). Then we have Sorti(t) — Sort(E{t)). 

2.2 Skeletons 

We assume a countable set of flow variables, ranged over by xy, which are used in the skeleton bodies 
to hold semantic values (states, intermediate values, ...). Among flow variables, we distinguish 
two of them: x a holds the semantic state at the start of a skeleton, and x 0 is supposed to hold the 
semantic result at the end of a skeleton. We let skeletal variables, ranged over by x or y, be the 
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Ln(const (x t )) 
Var (var(x t )) 

Add (x tl + x t2 ) 
EQ(x fl = x t2 ) 

NEG(-iX t ) 
Skip (skip) 
Asn(x h :=x f2 ) 
SEQ(x fl ; x f2 ) 

If (if x h x h x f3 ) 


While (while x tl x f ,) 


[litlnt(x f ) ?> xf t ; intVal (x/J ?> x 0 ] 

[read (x t , x a ) ?> x„\ 

H (x a ,x tl ,x fl ) ; islnt (x fl ) ?c > x fl ,;H (x a ,x h ,x f2 ) ; 

islnt (xy,) ?> Xf 2 , ; add (xyj,,x/ 2 ,) ?> xy 3 ; intVal (xy 3 ) ?>x D 

H (x a ,x tl ,x fl ) ; islnt (xy) ?t > x fl , ;H (x a ,x t2 ,x f2 ) ; 

islnt (x/ 2 ) ?> x/ 2 ,;eq (x fl ,Xf 2 ,) ?> x/ 3 ;boolVal (x/ 3 ) ?> x 0 
[H (x a ,x t ,x fl ) ■ isBool (xy) ? > Xf 2 , neg (xy 2 ) ?> xy 3 ;boolVal (xy) ?>x D ] 
[id (x a ) ?> x 0 ] 

[H (x a ,x t2 ,x fl ) ; write (x fl , x CT , x fl ) ?> x 0 ] 

[H (x a ,x tl ,x fl ) ;H (x fl ,x t2 ,x 0 )] 

H (x a ,x tl ,x fl ) ; isBool (x/;) ?> x/ t ,; 

H (x a ,x tl ,x fl ) ; isBool (x/J ?> x fv ; 

I isTrue (xf v ) \H ( x a ,x t2 ,xf 2 ) \H (xf 2 , whilex tl x t2 ,x a ) 
isFalse (x^,) ; id(x CT ) ?> x D 

Fig. 6. Skeletal semantics for While 



isTrue (x fl ,)\H (x a ,x t2 ,x 0 ) \ 
isFals e(x fl ,) ;H (x a ,x t3 ,x 0 )J {x 


union of term variables and flow variables. A skeleton has the shape NAME(c(x fl ..x fn )) := S, where 
Name is the skeleton name, c is a constructor, x tl ..x tn are term variables, and S is the skeleton body. 

Skeleton Body S ::= [] | B;S 

Bone B ::= H (x fl ,t,Xf 2 ) \ F(x 1 ..x n ) ?> (yi..y m ) I (Si..S„) y 

where H(— , is the (terminal) hook constructor and F ranges over the set of filter functions. 

A skeleton body is a sequence of bones. A bone is either a hook judgement Fl(xf v t,Xf 2 ), built 
using the constructor H(—, -, -) from an input flow variable xy;, a term t to be hooked during 
interpretation, and an output flow variable xy 2 ; or a. filter F(x i..x„) ?> (y i..y m ) which tests if the 
values bound to its input skeletal variables (xi..x„) satisfy a condition specified by F, and in that 
case outputs values to be bound to (y\..y m )\ or a set of branches (S\..S n )v which represent the 
different behavioural pathways, where V declares the skeletal variables that are shared and must 
be defined by all branches. 

A filter with no output skeletal variables is simply written F(x\..x n ). It then acts as a predicate. 

Requirement 2.2. We require that there exists exactly one skeleton for any given constructor c. 

Running Example. The skeletons of our While example are given in Figure 6. Requirement 2.2 is 
trivially satisfied. 

2.3 Flow Sorts 

We extend the sorts with flow sorts, that are the sorts of values in interpretations. In our running 
example, flow sorts are store for the variable store, val for values, int for integers, and bool for 
Booleans. We relate flow sorts to hooks and filters as follows. 


Proc. ACM Program. Lang., Vol. 3, No. POPL, Article 44. Publication date: January 2019. 











Skeletal Semantics and Their Interpretations 


44:7 


/ 

fsort(f) 

eq 

neg 

read 

write 

id 

(int, int ) —» bool 
bool —> bool 
( ident, store) —> val 
( ident, store, val) —» store 
store —> store 


f 

fsort(f) 

boolVal 

isBool 

isT rue 

isFalse 

bool —* val 
val —r bool 
bool —> () 
bool —» () 


/ 

fsort(f) 

litlnt 

intVal 

islnt 

add 

lit —> int 
int —> val 
val —> int 
(int, int) —r int 


Fig. 7. Filter sorts 


In a hook H(xp, t,xp 2 ), the flow variable xp stands for an input state that fits with t, and Xf 2 
stands for a result. Given a program sort s, we define in(s ) as its input flow sort and out(s) as its 
output flow sort. In our running example, the input flow sort of both expressions and statements is 
store. The output flow sort of expressions is val and the output flow sort of statements is store. 

Similarly, a filter F(xi..x„) ?> (yi..y m ) is assigned a signature, written fsort(F), of the form 
(si..s„) —> (sJ..s^j). We write () for the output sort of a filter if m — 0 and omit the enclosing 
parentheses when n or m is 1. Filter signatures for our running example are given in Figure 7. 

We check the consistency of the hook and filters with the skeletons in our well-formedness 
interpretation, introduced in Section 3.1. 

3 INTERPRETATIONS 

An interpretation I specifies base terms and how to interpret the empty skeleton body, hooks, filters, 
and branches. It defines a set of interpretation states, ranged over by E in this section but with 
specific notations for each interpretation, and a set of interpretation results, ranged over by O in 
this section, as well as the following relations: 

• ([[]J] Y (2) JJ O defining the interpretation of the empty skeleton body; 

• |lT(xy 1 , t, xy 2 )J (E) JJ E' defining the interpretation of a hook; 

• (JF(x 1 ..x„) ?> (yi..y m )l J (E) JJ, E' defining the interpretation of a filter, for each filter F; 

• (O, E) JJ E' defining the merging of the interpretation of branches, where O is 
a partial function from [l..n] to interpretation results, and where V is the set of skeletal 
variables defined and shared by all branches. 

Given a skeleton body S and the relations above, we define the remaining cases for the interpre¬ 
tation of S as follows. 


/p] J (E)JjE'\ 
\|[5]] J (E') JJ O) 
Ni e dom (O). |JS,| 7 (E) JJ O (ij\ 


\ 



(O, E) JJ S' 

/ 


p;Sf (E) JJO 


l(Sl..S B ) V ]{r(E)JJE' 


Interpretations enable us to define the meaning of skeletons by only specifying the parts that 
matter. Interpretations apply to any skeletons and are thus independent of the language. The rest 
of the paper presents different interpretation and their relations. 
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3.1 Well-Formedness Interpretation 

The first interpretation we consider is a well-formedness interpretation, to verify that every skeleton 
is well formed. More precisely, we verify that every skeletal variable used has been first defined, 
that every variable defined in a skeleton is fresh (with an exception for branches, see below), and 
that the sorting of filters, hooks, and branches are consistent. 

Intuitively, in the hook H(xf v t,Xf 2 ), flow variable x is used and flow variable Xf 2 is defined. 
Similarly, in the filter F(x 1 ..x n ) ?> (yi..y m ), skeletal variables (xi..x„) are used and skeletal 
variables (yi..y m ) are defined. The case for branches (Si..S n )y is a bit more involved. First, each 
branch S; must define the skeletal variables in V. Second, every variable defined in the whole set of 
branches must be distinct, with the exception of the variables in V as they have to be defined in 
every branch. And third, the only variables defined by the branches that may be used in the rest of 
the skeleton body are those in V. 

Assuming for each base sort a set of base terms, pairwise disjoint, we define the well-formedness 
(WF) interpretation in Figure 8. Its interpretation states and result consist of a pair of a sorting 
environments F, mapping term variables to base and program sorts, and flow variables to flow 
sorts, and a set D of skeletal variables that have been defined at that point. In this interpretation, 
we write x : s to state that the kind of variable and sort match, namely term variables with base or 
program sorts, and flow variables with flow sorts. 

The interpretation for the empty skeleton body is trivial, it simply returns its arguments. The 
interpretation of a hook H(xf t , t, Xf 2 ) checks that Xf is in T, that every term variable of t is also in 
T, and that variable Xf 2 is fresh (i.e., not in D). In addition, it checks that the sort for Xf t is what t 
expects as input sort and that xy 2 is latter bound to an output sort of t. 

The interpretation for a filter F(x 1 ..x n ) ? > (yi..y m ) is similar. It ensures that the input skeletal 
variables (xi. ,x n ) are in T, that the number and kind of both input and output variables match the 
signature of F , that the output variables are fresh, that the sort of the input variable corresponds to 
the input signature of F , and it continues binding the output variables to the output signature of F. 

Finally, the interpretation of the merging of branches checks that every branch is well formed, 
that the variables in V are exactly those shared by the branches (neither less nor more than those), 
and that the sorting environments returned by the branches all agree when restricted to V. In that 
case, the returned sorting environment is the concatenation of the input environment and the one 
shared by the branches. The n > 2 constraint is to have a more concise way of stating that the 
variables shared by the branches are exactly those in V. It is not a restriction as an empty set of 
branches is useless, it prevents the skeleton from being interpreted as offering no pathway, and a 
singleton set of branches can be inlined. 

Let t = c(t\..t n ) be a closed term such that Sort(t) — s, where s is a program sort. There are two 
ways to assign an output sort to t: directly, as out(s), or using the WF interpretation of the skeleton 
for c to compute the associated sort x D . If both coincide, we say the skeleton is well formed. 

Definition 3.1. A skeleton NAME(c(x tl ..x tn )) := S is well formed iff for any closed term t — c(ti..t n ) 
such that Sort(t ) = s, we have |[S]] wf (T, D) JJ (r', D') and r'(x D ) = out(s), with the initial sorting 
environment T being {x CT i—> in(s) + x tl ha Sort(t\)..x tn ha Sort(t n )}, and with D = dom(T). 

In the following we only consider well-formed skeletons. For instance, the skeletons for While 
are well formed. 

3.2 Interpretation Consistency 

We now define how to relate interpretations. Given interpretations f and I 2 , we assume a relation 
OKst(Zi, E 2 ) between the interpretation states, and a relation OKout(0\, 0 2 ) between their results. 
Intuitively, consistency is the propagation of these relations along interpretations. 
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xy- e dom (T) c 2) 

Tvar(t) C dom(T) 

T (xy;) = in{Sorti (f)) 
x f2 2) 

T' = T + xy 2 i—> out(Sortr ( t )) 

D' = Du{x f2 } 

(xj.-Xn) c dom(T) c 2) 

(x!..x„): (r(xi)..r(x„)) 

y\..y m are pairwise distinct 

(yi ..y m )n£) = 0 

= (r(xi) „r(x„)) -> (si..s m ) 

(yi-ym) ■ (si..Sm) 

r' = T + (y 1 ..y m ) (s!..s m ) 

£>' = 2) U (yi..y m ) 
n > 2 
dom (T) c 2) 

Vi G [l..n].0(i) = (T„2),) 

Vi G [1 ..n].dom(Ti) c 2); 

=> (2); \ 2)) n (2)y \ 2)) = V 
Vi G [l..n].r + Ti\y = r' 

£>' = U 2)i 

ie[l..n] 

Fig. 8. WF Interpretation 

We define two kinds of consistency: one about where interpretations are defined, i.e, whether 
they return a result, and one about their results. 

Definition 3.2. Interpretation f is existentially consistent with interpretation I 2 if for any S, Xj, 
X 2 , and Oj, such that OKst(J.\, Z 2 ) and (Si) || Oi, there exists a 0 2 such that HSJ^ 2 (X 2 ) JJ, 0 2 
and OKout(Oi, 0 2 ). 

Definition 3.3. Interpretations f and I 2 are universally consistent if for any S, 2), X 2 , Oj, and 0 2 , 
if OKstifu Z 2 ), [S]' 1 (Xi) || Oj and [S] /2 (X 2 ) || 0 2 , then OKout(O u 0 2 ). 

3.3 Proving Consistency 

Both consistency properties can be stated at the level of the building block of interpretations. 
Formally, we have the following two lemmas. 

Lemma 3.4. Let f and I 2 be two interpretations, OKst a relation between their input states, and 
OKout a relation between their output states. If for any Zi and X 2 such that OKst(Z\, X 2 ) we have 
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(1) im h (El) |) Oj => 3 0 2 . [[]] /2 (E 2 ) |) 0 2 A OKout(O u 0 2 ) 

(2) |% i ,U fi )] ,, (X 1 )lX; => 32'. |H(x /l ,t,x /2 )f(E 2 )^Z' A OKst(l[, E') 

(3) |[F(x 1 ..x n )?>(y 1 ..y m )]] il (E 1 )U2' 1 => 

32'. [F(x 1 ..x„)?>(y 1 ..y m )] l2 (E 2 )U2' A OKstfE'^E') 

(4) dom(0\) = dom(0 2 ) £ {1..n} A Vi € dom(0\).OKout(Oi(i),0 2 (i)) 

A [®Jy(tfi,2i)I2; => 3E'. [©X(02,2 2 )|2' AOfct^E') 
then is existentially consistent with I 2 . 

Lemma 3.5. Let f and I 2 he two interpretations, and OKst a relation between their input states. If 
for any Ei and E 2 such that OKst(Z\, E 2 ) we have 

(1) mtf 1 (2i) JJ Oi A [[]] /2 (E 2 ) JJ 0 2 => OKout(O u 0 2 ) 

(2) lH{x h ,t,x f2 )f { E0JJ2; A [Hfx^Lx^f^jJJE' => OKst(Z[, E') 

(3) [F(x!..x b ) ?> (yi..y m )] Jl (EO JJ EJ a lF( Xl ..x n ) ?> ( yi ..y m )I i2 (E 2 ) JJ E' => Ofef(E;,E') 

(4) dom(0\) c {l..n} A dom(0 2 ) £ {1. . 77 } A Vi € dom(0\) fi dom{0 2 ) .OKout{Ofi), 0 2 (i)) 

A A [®„]y (0 2 , 2 2 ) JJ E' =» OKst(I' v Z') 

then f and I 2 are universally consistent. 

4 CONCRETE INTERPRETATION 

We now define an interpretation used to compute a big-step evaluation semantics in the form of a 
triple set: a set of triples (also called judgements) of the form (state, term, result). For each base sort 
we assume a set of base terms, pairwise disjoint, and for each flow sort a set of values. We write 
t : s to state that base term t has base sort s, and v : s to state that value v has flow sort s. 

For each filter _F(xi..x„) ?> (yi..y m ) such that fsort(F) = (s\..s n ) —» (s(..s,'„), we assume an 
interpretation |F]| which is a relation between elements of (s\..s n ) and elements of (s(..s,'„). We 
write [F] (v\..v n ) JJ, (v'..v' m ) to state it relates to (v'..v' m ). 

The input state of a concrete interpretation is a pair comprising 

• an environment E mapping term variables to closed terms and flow variables to values, 

• a set T of triples of value, closed term, and value, representing already known judgements 
and used to give meaning to the sub-derivations H(xy;, t,xp 2 ). 

The interpretation result maps term variables to closed terms and flow variables to values. 

We define the concrete interpretation in Figure 9. For the empty skeleton body, it simply returns 
its environment. For a hook H(xf ;, t,Xf 2 ), it looks up in the triple set a known computation for 
E(xy-j) and Eft) whose result is v, and it continues binding xy 2 to v. Note that if the language is 
non-deterministic, there may be several such values and one is picked. For a filter F, one uses its 
interpretation with the input (E(xi)..E(x„)). As filter interpretations are relations, there may be 
several results as well. Finally, to merge branches, the interpretation picks a branch that successfully 
returned a result and extends its environment accordingly. 

Running Example. We instantiate the base sort ident with strings and lit with integers. We 
instantiate the flow sort int with integers, bool with Booleans, val with the disjoint union int + bool, 
and store with a partial function from strings to val. The concrete interpretation of the filters are 
the following partial functions: litlnt is the identity on integers, intVal and boolVal inject their 
arguments in int + bool, readf id, st): applies st to id (since st is a partial function, it may not return 
a result), islnt(w): matches v in the disjoint union int + bool, returns v if it is in int, addfp, i 2 ): 
returns the integer addition of ij and i 2 , eqfij, if): returns true if ij = i 2 , false otherwise, isBool(u): 
matches v in the disjoint union int + bool, returns v if it is in bool, writefid, st, v): returns the 
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[□items 

lH(x fl ,t,x f 2 )}(X,T)Ur,T) 

|[F (xi..x n ) ?> (yi..y m )J te T) JJ. (E', T) 

/ O(i) = EA , , 

y c dom (Ej) => 0 (O, (E, T)) j| (E', T) 

= E+E,|J i- n Jv 

Fig. 9. Concrete Interpretation 


/(E(x /i ),E(0,i;)gT\ 
\ = Z + x f2 ha v) 

Iff] (E (xi) ..E (x„)) U Oi..w m )\ 
E' = E + yi ha v 1 ..y m ha u m / 


partial function mapping id to u and any other id' to st(id'), id(sf): returns st, isT rue(b): returns 
() if b - true, isFalse(b): returns () if b — false. In the rest of the paper, we directly write x for 
var(x) and n for const(n) in the examples. 

4.1 Consistency of WF and Concrete Interpretations 

Definition 4.1. We say a triple set T is well formed if all its elements are well formed, i.e., if 
(it, t, v ) e T, then t = c(ti..t n ) and there is a sort s such that Sort(t) — s, a : in(s), and v : out(s). 

We define OKst((T, D),CZ,T)) as follows: T is well-formed, dom(T) — domCZ), and for any 
x e dom(T) we have E(x) : T(x). We define OKout((T. D), E) as follows: dom(T ) = dom(Z) and for 
any x e dom(T) we have E(x) : T(x). 

Lemma 4.2. The well-formedness and concrete interpretations are universally consistent. 

4.2 Concrete Derivations 

The concrete interpretation describes how skeletons can be interpreted from a set of hooks. The 
immediate consequence 'H describes how skeletons can be assembled. It starts from a set of well- 
formed triples (that is, of Hoare triples) T, and derives a new set of judgements using the concrete 
interpretation. Intuitively, from the set of triples generated by derivations of depth at most n , it 
builds the set of triples generated by derivations of depth at most n + 1. It is defined as follows. 

t = c(ti..t n ) A Sort (t) — s 
Name(c (xfj..x fn )) := S e Rules 
<r : in(s) 

E = x a i—> <r + x fl t\..x tn i—> t 

[sj terns' 

s' (x 0 ) = V 

Lemma 4.3. The functional "H is monotone. 

Proof. This is immediate by inspecting the interpretation of skeletal bodies, as the only one 
where T is used is for hooks, and a bigger T does not remove results. □ 

Lemma 4.4. IfT is a well-formed triple set, then 1T(T) is a well-formed triple set. 
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We now show that the smallest fixpoint of 7T corresponds to the set of triples generated by any 
finite derivation, or in other words, an inductive definition of the concrete rules. 

Lemma 4.5. IT is continuous: for any increasing sequence of triple sets (T ; ), we have |J; TTfTf) = 

mUiTd. 

Proof. We prove this result by double inclusion. The inclusion |Jz 'H(Ji) £ *H((J; T,) follows 
from the monotony of 7T. To show that < H{ (J ; T,) C (J ; 7T(T ( ), we show that for all triple set T 
and (o\ f, o) e 7T(T), there exists a finite subset T' of T such that (cr, t, o) e 'H(T'). This result is 
immediate by induction over the structure of the skeleton S. Then, for each (cr, t, o ) e 7T(U/ Tf), 
there exists a finite subset T' of |J ; T, such that (cr, t, o) e 'H(T'). As T' is finite and (T,) monotone, 
there exists u such that T' C T,. We conclude by monotonicity of TT. □ 

Definition 4.6. The concrete semantics (J is the smallest fixpoint of TT. 

Lemma 4.7. We have |J = |J n TT n {%). 

Proof. The set of triple sets ordered by inclusion is a CPO, and < H is continuous on this CPO. 
We conclude by Kleene fixpoint theorem. □ 

Lemma 4.8. The concrete semantics |J is well-formed. 

Proof. Let (a, t, v ) 6 JJ.. By Lemma 4.7, there exists a finite number n such that (cr, t, v ) 6 , K n (0). 
We prove by induction on n that (cr, t, v ) has the expected properties. It is immediate for 0, and for 
n + 1 we simply apply lemma 4.4. □ 

5 ABSTRACT INTERPRETATION 

This section describes how a set of skeletons defining a programming language can be re-interpreted 
over an abstract domain of properties to obtain an abstract interpretation of the language. 

5.1 Abstract Domains 

An abstract interpretation of a set of skeletons must define abstract domains for all the terms and 
flow sorts used in the skeleton bodies, ending with abstract semantic states and abstract results. 

Elements in the abstract domains represent sets of values in the corresponding concrete domain 
(they are related through the concretion function y introduced below). The abstract interpretation 
framework is designed to be parametric in the choice of abstract domains for base values such as 
integers, Booleans, and program states. All we require is that each abstract domain for sort s is a 
partial order C with a least element, denoted ± s , representing the empty set. For example, the lattice 
of intervals can be used as an abstract domains for integers, with ±;„ t being the empty interval. 
Similarly, a state that maps program variables to integer values can be abstracted as a mapping from 
variables to intervals, or as a polyhedron that defines linear relations between program variables. 

Skeletal variables can also range over terms. For each program or base sort s, we define an 
abstract domain by imposing a flat partial order on the set of terms of that sort ( i.e ., we relate a 
term to itself and no other term) and by adding a ± s element, smaller than all terms of that sort. 
Abstract base terms include every concrete base term, they may also include additional terms that 
denote sets of concrete base terms. To ease notation, we sometimes omit the sort in ± in an equality. 
In this case, v* = ± should be read v* — -1 -s or t(v*) an< i, v * A -L should be read v* + -L sort(v*y 

5.2 Abstract Interpretation of Skeletons 

In addition to the abstract domains, an abstract interpretation must specify its input and output 
states, and how the empty skeleton body, hooks, filters, and the merging of branches are interpreted. 
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^ H + Xf A I > -L out(Sort(Z # (t))) 

i 

E # (t) e t* 

(<r # , t # , -L) g T* 

= S + Xj 2 I > -l~out(Sort(Z # (t)))J 

/ 

E # (t) E t* 
(o*,tW)eT* 
v* E z/' 
\E # ' = E* + x /2 t->z; # 7 
fsort(F) = (s 7-4) ( s i-- s m) 
E # ' = E # + i/! h -L Sl ..y m <—> -L Sm 
/ (E # (x 1 )..E # (x„))EK..^)\ 

M # («)=± 

fsort(F) = (s(..sj,) -> (si..s m ) 
\,E # ' = E # + yi -L Sl ..y m t-» ± Sjn / 
/ (E # (x!)..E # (x„)) E 

M # K4) C «••<) 

v E # ' = E # + yi ^<..y m ^^ 
/ n > 1 \ 

V; G [l..n].0(O = (_L,E?) 
V; G [l..n].V E tfom (E*) 
Vi,j G E*| v = E*| v 

\ E # ' = E # + E *\ y ) 

/ <iom ( O ) = [l..n]\ 

£ = {E*|0(/) = (t,E*)}*0 
VE* G S.V c dom (Ef) 
\Ef e £ => E # ' = E # + E*| v / 


KF (/,s # ,T # ) jj (/,2 # ) 

[H(x /l ,f,x /2 )f(±,E # ,T # )U(±,E # ',T # ) 


( x /i> x fi)l # ( T ’E # , T # ) JJ (±, E # ', T 


l_H (x/,, f, x / 2 )] # (t, E # , T # ) JJ (t, E # ', T 


[f(x 1 ..x„)?> (yi..y m )f (-L,E # ,T # ) JJ (±,E # ',T # ) 


[f(x,..x„)?> (yi..y m )f (t,E # ,T # ) JJ (±,E # ',T # ) 


[f(x 1 ..x„)?> (t,E *,T*) JJ (t,E # ',T # ) 


© 


(f,0, (E # , T # )) JJ (_L, S # ', T # ) 


© 


(t,<9, (e # ,t # )) JJ (T,E # ',r # ) 


Fig. 10. Abstract Interpretation 


For each filter symbol F of signature (si..s„) —> (sj..s^) we assume a total function |[F]| # from the 
domain corresponding to (si..s„) to the domain corresponding to (sj..s^). A filter interpretation 
may return ± to state it is not defined for that input. 
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The input state of an abstract interpretation is a triple (/, X # , T # ) comprising a. flag f, an abstract 
environment X # (mapping skeletal variables to abstract terms and values), and a set of abstract 
semantic triples T* that gives semantics to hooks. A flag is either lorT and it indicates whether 
it has been determined that the current skeleton does not apply (±) or that it may still apply (T). 
The output state of an abstract interpretation is a flag and an abstract environment where skeletal 
variables hold the result of the abstract interpretation. Figure 10 defines the abstract semantics. 

The abstract interpretation of an empty list of hypotheses just returns the flag and environment 
from its input. There are three cases for the interpretation of a hook H(xf.t,Xf 2 ). If we have 
determined that the skeleton does not apply, we set Xf 2 to ± of the correct sort. In the two other 
cases, we need to have a triple (er # , f # , v*) from T* such that Z # (xy 2 ) E a* and E # (f) E t*. This loss 
of precision gives some flexibility for such a derivation. We then have two (non exclusive) cases: if 
v* = ±, then we know the skeleton does not apply, and set the flag to _L and Xf 2 to the appropriate 
±. For the last case, we do not restrict what v* is (it may still be ±), and we bind in the resulting 
environment Xf 2 to some v*' that may be less precise than v* , again to gain flexibility. 

The abstract interpretation of a filter F(x i..x n ) ? > (y\..y m ) also has three cases. If we know the 
skeleton does not apply, we just bind the output variables (y\..y m ) to the appropriate ± depending 
on the signature of F. Otherwise, we apply the filter interpretation to an approximation of the 
arguments as given by the environment. If the result is _L, we know the skeleton does not apply 
and switch the flag to ±, as well as extend the environment with _L of the correct sort. Otherwise, 
we keep the flag as T and extend the environment to an approximation of the result of the filter. 

For the merging operator, we interpret every possible branch and collect their results in O. If all 
branching have the ± flag (either because the ± flag was set before their interpretation, which would 
then be propagated, or because they newly returned it), then the skeleton does not apply and we set 
the flag accordingly, extending the environment with mappings from the shared skeletal variables 
V to ± of the correct sort. Otherwise, we collect all branches that have a T flag. They must all return 
abstract environments that agree on the shared variables (which is why the approximations in the 
filter and hook cases are useful, to ensure this is possible), and we extend the current environment 
with this common environment. 

The key difference between the abstract and concrete interpretations is how the different results 
are merged in case of branching. The concrete semantics picks one of them, whereas the abstract 
semantics requires all branches that provided a result to agree. This is because the goal of the 
abstract semantics is to infer abstract semantic triples that are valid statements about all possible 
resulting states, i.e., about all possible concrete choices in case of branching. 

5.3 Consistency of WF and Abstract Interpretations 

We define OKst((T, D), (/, Z # , T # )) as follows: T* is well formed, dom(F) = dom{Yf), and for any 
x e dom(T) we have E # (x) : T(x). We define OKout((T, D), (/, X # )) as follows: dom(T) = domfZ*) 
and for any x € dom(T ) we have Z # (x) : T(x). 

Lemma 5.1. The well-formedness and abstract interpretations are universally consistent. 

5.4 Abstract Derivations 

We define the abstract immediate consequence operator from well-formed triple sets to triple sets in 
Figure 11. As in the concrete case, the immediate consequence describes how to assemble skeletons. 

Lemma 5.2. The functional dd* is monotonic. 

Proof. This is immediate by inspecting the interpretation of skeletal bodies, as the only one 
where T* is used is for hooks, and a bigger T # does not remove results. □ 
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dd* (T # ) = 


(o\t*,v*) 


t* — c A Sort (f # ) = s 

Name(c ( x h ..x tn )) := S £ Rules 
cr* : in(s) 


X # = x a ha a* + x ti 


tl-xt, 


n n 


[S] # (t, £ # , T # ) (/, £ # ') 

2 # '(x 0 ) = t; # 


Fig. 11. The abstract immediate consequence operator 


Lemma 5.3. IfT* is a well-formed triple set, then dd*(T*) is a well-formed triple set. 

An abstract semantics jj # is a set of facts of the form (cr*, t*, v*) stating that from state cr* term 
t* evaluates to v*. A correct abstract semantics is one where such triples correspond to triples in 
the concrete semantics (see Section 5.5). The more facts an abstract semantics contains, the more 
useful it is, as it provides more information about the behaviour of terms. Hence, we choose as 
abstract semantics the one with most facts, i.e., the greatest fixpoint of dd*. This choice provides a 
proof technique: since the greatest fixpoint is the union of all sets such that T* C dd*(T*), to prove 
that a fact is correct, one can propose a candidate set T* containing this fact, and then show that 
T* C dd*(T*). This amounts to proving that the facts T # constitute an invariant of the semantics. If 
we were to translate such invariants into a derivation, the resulting derivation may be infinite. 2 

We could also define the abstract semantics |) # as the smallest fixpoint of dd*. This would be 
sound but, having fewer facts, we would then miss valuable abstract results. More precisely, if a 
triple (<r # , t, v # ) belongs to the smallest fixpoint of f H*, then (as abstract triple sets form a CPO 
ordered by inclusion and dd* is continuous on this CPO), there exists a finite number n such that 
(er # , f, v*) £ dd* n (0). In other words, there exists a finite abstract derivation yielding the triple 
(cr*, t, v*). This implies that for all concrete state cr £ y(cr*), the program t terminates. We would 
thus have lost all facts for which the abstract semantics cannot prove termination. Defining the 
abstract semantics as the greatest fixpoint of dd* solves this issue. 

Definition 5.4. The abstract semantics [J* is the largest fixpoint of dd* as a function from well- 
formed triple sets to well-formed triple sets. This restriction is well-defined by Lemma 5.3. 

Lemma 5.5. JJ. # is well formed. 

Proof. As |) # is the largest fixpoint of dd*, it is the union of all well-formed triple sets T* such 
that T* C dd*(T*). Let (cr*, t*, v*) £ JJ. # , there is T* C dd*(T*) where (cr*, t*, v*) £ T* and T* is well 
formed. Hence (cr*, t*, v*) has the requested properties. □ 

5.5 Consistency of Concrete and Abstract Interpretations 

We assume a concretion function y for the abstract domain, from abstract terms to sets of concrete 
terms, and from abstract values to sets of concrete values. We impose several constraints on y. First, 
y must be compatible with C: if t £ y(t*) and t* E t*', then t £ y(t*'), and if v £ y(v*) and v* E v*', 
then v 6 y(v*'). Second, for any abstract term t* of sort s, the set y(t*) must only contain terms of 
sort s. In addition, y(c(t*..t*)) = {c(f 1 ..t n )|f,- € y(t*)}. Conversely, for any concrete term t, we have 
y(t) = {t}, as abstract base terms are extensions of concrete base terms. 

2 See the Figure 10 of [Schmidt 1997a] for an example of such representation. 
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Lemma 5.6. Let 'LL be a mapping from term variables to concrete terms, and Y* be a mapping from 
term variables to abstract terms. IfTvar(t) C dom(Y), Tvar{t) C dom(Y*), andVx t e Tvar(t), Y(x t ) e 
y{Y\x t )), then Y(t) e y(Y*(t)). 

Proof. By induction on the structure of t . If it is a base term, then the result holds by hypothesis 
on base terms, if it is a term variable, then the result is immediate, and otherwise we prove the 
property by induction on the subterms. □ 

Regarding values, we have similar restrictions: for any abstract value v* of sort s, the concrete 
values in y(v*) all have sort s. We also require the abstract interpretation of filters to be consistent 
with the concrete one: if [F] (v 1 ..v„) jj and Vi e [l..n].w ; e y(v*), then [FJ # (u*..w*) = 

(v*f ..v*^) and Vi e [l..n].v' i e y(v*'). In particular, if the concrete filter relates its input to an output, 
the abstract filter cannot return ±. 

Definition 5. 7. Let T a concrete triple set and T* an abstract triple set. We say they are consistent 
if for any (<x, t,v) € T and (a*, t*, v *) e T # , if a e y(a*) and t e y(t*), then v e y(v*). 

We define OKst((Y, T), (/, £ # , T # )) as follows: f = T, dom(Y ) = dom(Y*), for any x 6 dom(Y), 
we have £(x) e y(£ # (x)), and T and T* are well formed and consistent. We define OKoufiY , (/, Y*)) 
as follows: f — T, dom(Y) = dom(Y*), and for any x € dom(Y), we have £(x) e y(X # (x)). 

Lemma 5.8. The concrete and abstract interpretations are universally consistent. 

Lemma 5.9. Let T and T* well formed and consistent triple sets, then LT{T) and , 7f # (T # ) are well 
formed and consistent triple sets. 

We finally show that the abstract semantics is correct relative to the concrete semantics. In a 
nutshell, for any triple in the concrete semantics |J (the smallest fixpoint of 'LL) and any triple in 
the abstract semantics |J # (the largest fixpoint of LT*), if the input states and terms are related, then 
the output values are related. Formally, we have the following. 

Definition 5.10. An abstract triple set T # is correct if it is well-formed and consistent with |J. 

Theorem 5.11. |J # is correct. 

Proof. We prove by induction on k that r H k (fb) and |J # are well formed and consistent. The 
check that |J # is well formed is simply Lemma 5.5. 

The result is immediate for k = 0 since 0 is well formed, and there is nothing else to check. 

Let k — n + 1, by induction we have FT"(0) and |J # are well formed and consistent. By Lemma 4.4 
we have FT" +1 (0) and FT # (|J # ) = |J # are well formed and consistent, as required. 

To conclude, we apply Lemma 4.7. □ 

Note that in the previous theorem we only use the fact that |J # is a fixpoint: it does not have to 
be the greatest fixpoint. 

5.6 Example: Interval analysis of While 

To give a concrete example of an abstract interpretation we design a value analysis of the language 
While in the style of Schmidt’s Abstract Interpretation of Natural Semantics [Schmidt 1995]. We 
have the following flow sorts in the semantic definition (cf. Figure 7): int, bool, val, and store. 

We describe an analysis in which integers are approximated by intervals, ordered by inclusion. 
Writing [n, m ] for the interval of integers between n and m (with the convention that [n, m] = 0 if 
m < n), we can define the abstract domains for each of the flow sort as follows: 

int* = ([ n , m] : n € Z U {-°o} A m 6 Z U {+oo}) val* = int* X bool* 

bool* = {± b 00 i, true*, false *, Tj 00 ;} store* — ident* —> val* 
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|[litlnt] # = A(n).[n , n] 

[intVal] # = Ai. (; i, ± bool ) 

|[islnt| # = A(i, b). i 

fadd]* = A([l i, Mi], [I 2 , W 2 ]).[/i + h. Mi + M 2 ] 


[eq] # = A(i\, i 2 ). 


true* 
false* 


if i x = [n, n] = i 2 
if i'i Pi 12 = 0 


T bool otherwise 


|readj # = A (<x # , x) . <x # (x) 
IisTrue] # =At. ^ if ' b 6 

() otherwise 


[idf 

[boolValJ # 

[[isBoolJ # 


[negf 

|[writej # 

|isFalseJ # 


Aa*. a* 
Ab.(± int , b) 
A(i, b). b 


Ab. 


false* 

true* 


b 


A(x, a*, v*). <r 


if b = true* 
if b = false* 
otherwise 




if b € {±bool, true*} 
otherwise 


Fig. 12. Abstract interpretation of filters 


Abstract base terms are concrete base terms. We abstract identifiers by themselves, ident* = ident, 
with only the trivial (reflexive) ordering. The abstract domain of Booleans is (isomorphic to) the set 
of subsets of Booleans, ordered by inclusion. The abstract domain of values is the defined as the 
Cartesian product, ordered component-wise, of the abstract domain of integers and Booleans, where 
each component gives an approximation of the concrete value, provided that the value is of the 
corresponding sort. Stores are mappings from identifiers to values, ordered pointwise. Undefined 
identifiers are mapped to the undefined value ± val *. The concretisation function y from abstract 
domains to concrete domains formalises the relation between concrete and abstract values. 

y([n,m]) = {i\n<i<m} y (i, b) - y (i) U y (b) y (T bo ol) - {true, false} 

y(±bool) = ® Y ( true*) = {true} y (false*) = {false} 

Y ( a *) = { 2 | Vx - X M e Y i a * to)} 

The abstraction of the basic filters used in the definition of While is given in Figure 12. Notice 
that the abstract interpretation of the filters islnt and isBool return an abstract integer and an 
abstract Boolean, respectively, instead of a Boolean stating whether their argument can be an 
integer and a Boolean. This is correct because an abstract value that is only an integer has the 
shape (i, ± b 00 i), and applying isBool to it returns ±b 00 h indicating it contains no Boolean. 

Lemma 5.12. The abstract filters are consistent with the concrete filters. 

Lemma 5.13. The abstract semantics of While is correct. 

6 DERIVING PROOF TECHNIQUES FROM AN ABSTRACT SEMANTICS 

This section presents several proof techniques derived from an abstract semantics and instantiated 
in our While language. 

6.1 Abstract Rules for Analysing While 

Given the instantiation of the filters used in the abstract semantic of While, we can now derive an 
abstract interpretation of While programs. The result of an abstract interpretation of a program 
is a set of abstract triples that correctly describes the program behaviour. We shall present the 
analysis through a set of syntax-directed inference rules for inferring such triples. For a given term 
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c(t\..t n ), we take the corresponding skeleton NAME(c(x tl ..x tn )) := S in the semantics and apply the 
general abstract interpretation to the skeleton body S. This results in a series of conditions for a 
triple to be valid that will form the hypotheses of the inference rules. 

Rule for addition. As a first example, we derive a rule for analysing arithmetic expressions such 
as t\ + t2- A triple (<r # , t\ + t2, v *) is valid if it belongs to a fixpoint T* of 'H*. Unfolding definitions, 


o 


(er # , t\ + f 2 , v*) e T* = 'H* (T # ) 

H (x a ,x h ,x fi ) ; islnt (x/J ?> x fl ,;H (x a ,x t 2 ,Xf 2 ) ; 
islnt f Xf 2 ) ?> x/ 2 ,;add (xf v ,Xf 2 ,) ?> x/ 3 ; intVal (x/ 3 ) ?> x 0 
A Ef = x a ha a* + x ti i—> tj + x h i—> t 2 A v* = E* a (x a ) 


UfX) 


For simplicity, we here choose to ignore weakenings and the non-T-case for the flag /. In other 
words, we are ignoring the possibility of short-cutting the abstract interpretation of the rule if a ± 
is found during the abstract execution. 

(a*,h + t 2 ,v*) e (T # ) 

<= X(x a )X(x tl )M)zT* 

islnt (x/J ?> Xf, ; H f x a ,x h ,Xf 2 ) ; islnt (x/ 2 ) ?> x/ 2 ,; 
add (x fi ,,xf 2 ,) ?> x/ 3 ; intVal (x/ 3 ) ?>x 0 

A E* = x a t —> a* + x tl i—^ t\ + x f2 i—^ f 2 A Ef — Ef + xy 3 i—» vf A v* = E* a (x 0 ) 


(t,e*,t # )^(t,e # 0 ) 


Interpreting the filter islnt makes us consider the integer projection of the abstract value v*. 
We can thus rewrite the implication as follows. 

(a\ h + t 2 , v # ) e 'H* (T # ) 

<= (<r # , tu uf) G T* A i;f=(if,frf) 

H (x CT ,x f2 ,x/ 2 ) ; islnt (x/ 2 ) ?>x/ 2 ,; 
add {x fl ,,Xf 2 ,) ?> x/ 3 ; intVal (x /3 ) ?>x„ 

A Ef = X CT HA < 7 # + X ?1 HA fj + Xf 2 HA f 2 + Xfl H wj + Xc, HA if A V* - E # a (x D ) 


(t, e 2 ,t # ) u (t.e*) 


We can continue unfolding the abstract interpretation of the rule. We eventually reach the 
following implication: 

(o-Vj +f 2 ,u # ) (T # ) 

<= (a*, tu v\) e T* A (cr # , f 2 , r, 2 # ) e T* A v* = {if b*) A v* = (i* 2 , b*) A v* = E # 0 (x„) 

A [add (xy;,,x/ 2 ,) ?> x/ 3 ; intVal (x /3 ) ?> x D ] # (T,Ef,T # ) JJ. (T, Ef) 

A E 4 = Xo- HA (T # + X fl HA fj + X f , HAf 2 + X / 3 HA uf + Xy;, HA jf + Xf 2 HA V* 


V* 0 + Xf 2 , 


i= (X tu (i* v b* )) e T* A (a*, t 2 , ( if , **)) e T* A [add]* (if, i 2 # ) = i* A r/ = (i # , ± fcooZ ) 
By writing a* H t : v* for (a*, t, v*) e T* we get the familiar rule below. 




H t 2 : (if, frf) [add]* (if,! 2 ) = i 


t 1 + t 2 : ( i*, ± bool) 
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Rule for conditionals. In the case of the addition, the structure of the skeleton was linear. We have 
seen that we ignored some branches (the ones triggering ±), but these were not very important. We 
now show the example of conditionals, where branches are more visible. A triple (a*, if t x t 2 t 3 , cr*) 
is valid if it belongs to a fixpoint T* of r H*. Unfolding definitions, and passing through the linear 
part of the skeleton, we get: 

(a*,ift 1 t 2 t 3 ,o* 0 )€'H* (T # ) 


H (x a ,x ti ,x fl ); isBool (x fl ) ?> x f ,; . 


1*0 } 


isTrue (x fl ,) ;H ( x a ,x h ,x „) 

^isFalse ( x fv ) ;H (x a ,x h ,x 0 ) 

A Xf = x a i—» a* + x f , i—> fi + x t2 i—» t 2 + x h i—» ^3 A a* — Xf, (x„) 
(a*, tu vf) € T* A D f=(if,*f) A 
isTrue (x fi ,)\H (x a ,x h .x 0 ) \ 




isFalse (x fl ,) -H (x a ,x t3 ,x 0 )i 


ho }J 


(rXT # )infX) 


A Ej = X a (T # + X f , I—> fl + Xf 2 I—> t 2 + Xf 3 t 2 + Xf 3 I—> + xy,, I—> 

From this stage, we continue the analysis in each of the two subbranches to build a map O 
representing the outputs of both branches. We consider two cases, depending on the value of b*. 

First, if b* is T j 00 ;. We then have both isT rue and isFalse holding on b*. By unfolding definitions 
and using weakening for the results of the two hooks, we get the following implication: 

{o*,ifht 2 h ,ol)z<H* (T # ) 

<= (a # ,t u vl) eT* A (a*, t 2 , a*) 6 T* A (a*, t 2 , a*) e T* 

A v\= (if, T boo i ) A a 2 o a* a a* E a* 

Using the same notations as above we can simplify this rule as below. 

q # Hh:(if,T fc00 ,) <j # t 2 : a 2 °*2 C ^o # o* * h : o* a 3 # C 

a* b if fj t 2 1 3 : a* 


This rule is imprecise (we assume that we get T boo i when evaluating the conditional’s expression), 
but shows how our equivalent of concrete rules are merged in the abstract interpretation. We now 
consider a more precise version of the rule, for the case when the conditional expression evaluates 
to true*. The other cases false* and ± boo l are similar. In this case, the isTrue filter holds, but not 
isFalse: we can derive the judgement below when X # (xy,,) = true*. 

[isFalse (x /l ,);H(x ff ,Xf 3 ,x 0 )] # (T,E # ,T # ) U (±, S # D ) 

Following the rules for abstract interpretation (see Figure 10), this removes the second branch from 
the & set, only leaving constraints from the first branch. We thus get the following implication, 
where we no longer need the weakening for the result of the hook. 

{a*,ift 1 t 2 t 3 ,a* 0 )€'H* (T*) 

<= (cr # , ti,wf) e T* A (cr*,t 2 ,a*) e T* A v\ = (if true*) 

We can rewrite this implication as above into the rule 

a* b fi : (if, true*) a* b t 2 : erf 
a* b if h t 2 1 3 : a* 
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Rule for loops. The skeleton for loops is close to the one for conditionals. We can similarly derive 
abstract rules such as the ones below. 

cr # I- : (if T bool) <7 # I - t 2 ■. a* a 2 *■ while fj t 2 : a* a* E a* 
a* b while t 2 : a* 

a* v t\ : (i f true*) cr* b t 2 \ a* a 2 P while t\ t 2 : a* a* b t\ : (if false*) 

it* b while fi f 2 : cr* cr* b while t\ t 2 : a* 

We can also use the fact that any fixpoint of “TT* is considered valid. The following implication 
(which we can prove in a way similar to above) is valid for any well-formed set T*. 

(a*, while t\ t 2 , cr*) e 'H* ( T*) 

<= (<T*,tuvi)€l* A V$Q(if T b00l ) 

A (cr*, t 2 , cr*) e T* A (af while fj f 2 , cr*) e T* A cr* c, cr* A a* c <x* 

In particular, as the condition vf n (if T bool ) is vacuously true, we can weaken this implication as 
follows (forcing all intermediate states to be the same). 

( cr*, while t 1 1 2 , a*) e 9~(* (T*) <= (<r*,t 1 ,v*) e T* A (a*, t 2 , cr*) e T # A (er # , while t 1 1 2 , a*) e T* 

This implication means that given any Tf such that T 0 # C r H*(Tg), that associates t\ in the state a* 
with a result (that is that there exists v* such that (a*, t\, v*) e T 0 # ), and such that (<r # , t 2 , a*) e T 0 # , 
we can extend T 0 # into T* = T 0 # U {(a*, while t\ t 2 , <J*)}. By monotonicity of 'H*, we get C 7T # (T 1 # ), 
and by the above implication, we get (it*, while t\ t 2 , a*) € r H*(T*). Hence, T* C , 7f # (T 1 # ), and every 
triple in T* is correct in relation to JJ,. In other words, the following familiar rule is admissible. 

a* I- t\ : v* <r* b t 2 \ a* 
a* b while t\ t 2 : a* 


6.2 State Splitting 

As another example of the use of the abstract interpretation, we show how to extend the abstract 
semantics to obtain more precise results. Our motivating example is t: while —i(x = 0) x := x — 1 for 
which we want to show that the triple (x [ 0 , oo], t, x i—> 0 ) is correct (we simplify notation and 
write nfor ([n, n\, ±bool), an d [n, m] for ([n, m ], ±bool ))• Proving this is not possible as such. To see this, 
observe that in the rule for While, the same state is used to run the expression and the statement, 
hence the return value of the expression is not reflected in the state (it may only prevent a branch 
from being taken). Communicating information from an expression back to a state is a non-trivial 
problem which depends on the language considered, but we can help the abstract interpretation by 
splitting the state in three parts: {(x i—» 0 , t, x i—> 0 ), (x i—> [ 1 , oo], t, x t—> 0 ), (x t—> [ 0 , oo], t, x i—» 0 )}. 
Let T f be the set of triples (listed below) obtained from adding triples for every sub-expression of 
f. We can show that {(x t—> 0, t, x 0), (x i—» [1, oo], t, x i—> 0)} C 7T # (T # ) (the second triple uses 
(x i—» [0, oo], t, x t-> 0) to evaluate the recursive while term). However there is still one of the three 
triples that cannot be derived, viz., (x i—> [0, oo], t, x i—> 0) 6 'H*(T*). 

To derive this third triple, we introduce a proof technique called state splitting to obtain a 
more precise abstract semantics. The core idea of the technique is that if the state a* of a triple 
(er # , t*, v*) is covered by the states of some triples (a*, t*, v*)..(a*, t*, v*), in the sense that y(<r # ) G 
y(fjj # ) U .. U y(<r*), then we may use (a*, t*, v*) in the input triple set T* of f H*(T*) without having 
to show that (a*, t*, v*) is in the resulting triple set < J~(*(T*) and still remain correct. 

Formally, we first define a function Sp from triple sets to triple sets that adds such triples. 
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' 

{(a* t*,v*)..{a*,t*,v*)} c T* with n > 1' 

(a*,tW) 

Vi e [l..n].Sort (<r # ) = Sort (of) 


y (<r # ) C y(af) U..Uy(a*) 


Definition 6.1. Let T* an abstract triple set. We define the state splitting function Sp(T*) as: 


Sp{T*) = 


Lemma 6.2. For any T*, T* C Sp(T*), andSp is monotonic. 

Lemma 6.3. Let T* a ■well-formed triple set, then Sp(T*) is well formed. 

Proof. Let (er # , t*, v # ) e Sp(T # ), then there is some of, t*, v* e T* such that Sort(cr*) = Sort(a*) = 
in(t*) and Sort(v*) = out(t*). □ 


We next show that the functional SpfiH* {Sp{-))) has the same consistency property as 'H*(-). 

Lemma 6.4. Let T and T* be well formed and consistent triple sets, then LdlT) and SpfiH*(Sp(T*))) 
are well formed and consistent triple sets. 

We finally state that the proof technique is correct. 

Lemma 6.5. LetT* a well-formed abstract triple set. IfT* C < H*{Sp(T*)), then Sp(T*) is correct. 
We turn back to our example. Consider the following triple set. 

'(xh0,0,0),(xh [1, oo], 0, 0 ),(xh 0, -1, -1), (x i-> [1, oo], -1,-1) 

(x 0, x,0),(x H) [1, oo], X, [1, oo]) , 

(x i—> 0, x = 0, true *), (x t —> [1, oo], x = 0, false*), 

(x h 0, -i (x = 0) , false*) , (x i—» [1, oo], -i (x = 0), true*) , 

(x i—x [1, oo], x - 1, [0, oo]) , (x t—> [1, oo], x := x — 1 , x [0, oo]) , 

(x i—x 0, t, x i—x 0), (x i—x [l, oo], f, x t—> 0) 


We can show that T* C SpfH*(Sp(T*))), hence every triple of Sp(T*) is correct, in particular 
(x i—x [0, oo], t, x i—> 0). 

Note that this proof technique does not depend on the programming language considered. The 
difficulty is transferred to the choice of how to split the state, but as long as the splitting is correct 
(the added triple is covered by the existing ones), the resulting technique is sound. 


7 CONSTRAINT GENERATION 


As a final interpretation, we show how the abstract interpretation can be used to construct an 
actual program analyser. We define the analyser as an interpretation that generates data flow 
constraints to analyse a given program [Nielson et al. 1999]. 3 Constraint-based program analysis 
is a well-known technique for defining analyses. We show how this technique can be lifted and 
defined entirely as an interpretation, by generating constraints over all the flow variables used in a 
semantic definition. 

We first need to formalise (and extend) the standard notion of program point. We take a program 
point pp to be a list of integers denoting a position in a term. Program points form a monoid with 
concatenation operator ■ and neutral element e. We define a subterm operator t@pp as follows. 


t@e = t 


c{h..t n )@k-pp 


t k @pp ifke [l..n] 
undefined otherwise 


3 We impose the technical restriction that any hook used in a skeleton can be matched to a program point of the program 
(closed term) fo under consideration. Thus constraint-based analysis of code-generating code is not considered here. 
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We assume a function PP that for a given term to states the set of program points for which 
constraints will be generated. It typically consists of the set of executable subterms of t 0 . We require 
the programpoints of PP(t 0 ) to be executable: if pp 6 PP(t 0 ), then f 0 @pp = c{tx..t n ). Requirement 2.2 
enforces the existence of a skeleton for this term. 

We next define a partial operator HPP to that associates program points to the terms occurring 
in the hooks of a skeleton. Formally, if HPP to ( pp, N, t ) = pp', then (1) skeleton N is applicable: 
pp e PP(t 0 ), fo@PP = c{t\..t n ), and N is of the form N {c(x tl ..x tn )) S, (2) a hook H{_, t, _) occurs 
in S, and (3) the resulting program point is part of the set of explored program points: pp' 6 PP(to) 
and f 0 @pp' = (x t] h„x tn t n ){t). 

Constraints are either of the form [x = x'], [x E x'], or [x : s], where x and x' are variables and 
s a sort. We generate variable names in constraints of the form pp-x. The constraint generation 
function Gen that takes a program f 0 and returns the set of constraints generated by f 0 is defined as 

pp e PP (to) A fo@pp = c(t\..t n ) 

N(c (x fl ..x fn )) := S € Rules 

[S] c (N,pp,0)UC 

£>n( 0) = {x tl ..x tj! ,x CT } A x 0 € £> n (C) 

For each skeleton N we define a function Dn that maps sets of constraints to sets of skeletal 
variables. This is not necessary for the constraint generation but is used to prove consistency 
between constraints and the abstract semantics. 

The constraint generation interpretation of skeletons |[S]] e is given in Figure 13. The rule for 
hooks generates constraints for connecting the input state pp'-x CT with the flow variable holding 
the input state in the hook pp-xp , and the resulting output state of the hook with the output of 
the hook. Each filter comes with a constraint generation function |[F]] e specific to the analysis 
of that filter. We require that the constraints generated for that filter agree with the abstract 
semantics: if 5 is a solution to the constraints [F]| e (pp-xi..pp-x„, pp-yi..pp-y m ), then following 
holds: [F] # (<S(pp-Xi)..«S(pp-x„)) E (S(pp-yi)..S(pp-y m )). For analysing a set of branches, we 
generate constraints for each branch and return the union of these constraint sets. 

Correctness. A solution S of a set of constraints C is a mapping from the variables in C to abstract 
values and terms such that every constraint in C holds. 

Lemma 7.1. Let t 0 be a term and S be a solution of Gen(to). Let T* be defined as follows: 


■ 

pp e PP(t 0 ) - 

{a*, t,v *) 

t = f 0 @PP 

5 (pp-x^) = a* 

, 

S (pp-x 0 ) = v *. 


Then T* is well typed and T* C 7Y # (T # ). 

Discussion. The constraints we generate are path-insensitive: they do not capture the fact that 
when a filter does not hold, the rest of the skeleton does not matter. Constraints can be path- 
sensitive by letting the state of the interpretation be a pair consisting of a set Stop of constraint 
sets representing pathways in the skeleton that are stopped, similar to the ± flag in the abstract 
interpretation, and another set Run of constraint sets representing all the running paths. When a 
filter is encountered, the Run sets are added to Stop with the additional constraint that the filter 
returns ±. The usual constraints for the filter are added to each set in Run. In a nutshell, we duplicate 
constraints for each filter: once when it does not hold, and once when it may hold. At the end of 


Gen(t 0 ) = C U 


[pp-x a : in (Sort (f 0 @pp))] 
[pp-x 0 : out (Sort (f„@pp))] 
Vi e [1[pp-x ti = ti] 
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C' = CU 


/ HPP k (pp,N,t) = pp'\ 

x f e D n (C) 

[ [pp-x/; E pp'-xo-],) 

[ [pp'-x 0 E pp -x f2 ] j 
\ £>n(C') - 2)n(C) U {xf 2 }) 

/ {xi..x„}c0 N (C)\ 

/pp-X^pp-X,,,! 

IFY )=Cf 

\pp-yi--pp-ym) 

C’ = C u C f 

\DK(C') = D N (C)U{ yi ..y m }) 

I i* !\ 

Vi e [l..n].<9 (i) = C; 
Vi e [l..n].V c D n (C,) 

C' = CU |J C; 

ie[i..n] 

V D N (C') = D N (C) U V) 


mr(N,pp,mc 


[H (x fl , t, Xf 2 )Y (N, pp, C) II (N, pp, C) 


IF(xi..x„) ?> (t/i..y m )F(N, pp, C) U (N, pp,C') 


© 


(O, (N, pp, C)) (N, pp, C) 


Fig. 13. Constraint Generation 


the interpretation, the global constraint to be satisfied is the disjunction of all constraint sets in 
Run and Stop, each constraint set interpreted as a conjunction of its atomic constraints. 

Example. Consider t 0 — while -i(x = 0) x := x — 1. Its executable subterms are 

PP(to) = {e, 1, M, I’M, M- 2 , 2 , 2 - 2 , 2 - 2 - 1 , 2 - 2 - 2 } . 

Note that the subterm x appears both as program points 1-1-1 and 2-2-1 of to. For the differ¬ 
ent filters we generate symbolic constraints that will reuse abstract filters: JisBoolJ c (x, y) = 
{[y = isBool(x)]}. A mapping 5 is then a solution of such a symbolic constraint if S(y) = 
[isBool] # (S(x)). 

The definition of Gen(to) generates a large number of constraints. We focus on a selection of 
them: those generated by the initial program point e. The associated skeleton is 

While (while x tl x t2 ) := \H (x a , x fl , xyj ; isBool (xyj ?>xy 1 ,;...J . 

The constraint generation then produces the constraints 

[e-x a : store ], \ e ~ x h = (x = 0)] , [e-x 0 : store ], \ e ~ x t 2 = x := x — l] . 

as well as the constraints given by |ff(x CT ,x fl , x^); isBoolfxy-j) ? > Xf v ;.. .| c (N, pp, 0). The hook 
case links the variable e-x a to the input of x tl , which here represents -i(x = 0): HPP t(> (e, While, x tl ) = 
1 and we thus generate the two constraints 

[e-x a E l-x ff ], [l-x„ E e-x fl ] . 
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c 

Signature 

in 

out 

expr 

expr —* stat 


c 

Signature 

throw 
try catch 

stat 

(stat, stat) —» stat 


c 

Signature 

ref 

! 

expr —> expr 
expr —> expr 
(expr, expr) —> stat 


Fig. 14. Additional Constructors for While 


f 

fsort(f) 

in 

in —> (val, in) 

alloc 

(heap, val) —* (heap, loc) 

locVal 

loc —> val 

isLoc 

val —» loc 

get 

(loc, heap) —» val 

set 

(loc, heap, val) —» heap 

out 

(out, val) —* out 


f 

fsort(f) 

mkSt 

(in, out, store, heap) —* state 

splitSt 

state —> ( in, out, store, heap) 

mkValSt 

(val, state) —> valState 

getValSt 

valState —> (val, state) 

mkOK 

state —> excState 

mkExc 

state —» excState 

isOK 

excState —> state 

isExc 

excState —» state 


Fig. 15. Additional filters 


The constraints on \-x a and l-x a are generated when considering the program point 1, corresponding 
to the evaluation of -i(x = 0) (corresponding to the skeleton Neg). As stated, the set of all generated 
constraints is large; it is provided in the supplementary material on the companion website. 

8 EXTENDING WHILE WITH EXCEPTIONS, INPUT/OUTPUT, AND A HEAP 

To further illustrate the use of skeletal semantics, we extend our While language with exceptions, 
input/output, and a heap. We first need to define new flow sorts: in for input streams, out for output 
streams, heap for heaps, loc for locations in the heap, state for the combination of the streams 
with a store and a heap, valState for the further combination with a value, and excState for a state 
extended to signal whether an exception was raised. We still have two program sorts (expr and stat), 
but their input flow sorts are state, and their output flow sorts are now valState for expressions and 
excState for statements. Figure 14 lists the additional constructors of our language. The additional 
filters are defined in Figure 15, the rules for expressions in Figure 16, and the rules for statements 
in Figure 17. To help reading the rules, flow variables have names related to their sorts: a for state, 
w for valState, v for val, n for int, i for in, and so on. 

Instantiation of concrete interpretation. We instantiate the in and out sorts with list of values, 
denoted by L. We instantiate locations as integers. A heap is a pair of an integer (the next free 
location) and a map from integers to values. We instantiate the state sort as a tuple of in, out, store, 
and heap, the valState sort as a pair of val and state, and the excState sort as a pair of a Boolean 
and state. The val sort is extended to include a case for locations, as well as the intVal, boolVal, 
islnt, and isBool filters. The locVal filter injects a location in the val type, and the isLoc filter 
applies if the val argument is a location, which it then returns. The in filter applies if the input 
list is not empty, it returns its head and its tail. The alloc filters applied to ( (n, m), v ) returns the 
heap (n + l,m + nH»). The get filter applies if the location is in the heap, and it returns the 
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Lniconst (x t )) := 
Var (var(x t )) := 


iN(in) := 


Alloc (ref ( x t )) := 


Acc(!xt) := 


[litlnt(x f ) ?> Xf n \ intVal (xy n ) ?> x A ;mkValSt (xp v ,x a ) ?> x D ] 
splitSt (x ff ) ?> (x fi ,x fo ,x fs ,x fh ); read (x f ,x /s ) ?> x fv ; 
mkSt (x fi ,x fo ,x fs ,x fh ) ?> x /t7 ;mkValSt (x fo ,x f<T ) ?> x„ 

splitSt (x ff ) ?> [x f; , x fo , x fs ,x fh ); in (x fi ) ?> (x/^x/.,); 

mkSt (x fi „x fo ,x fs ,x fh ) ?> x fa ; mkValSt (x/^x/J ?> x D 
H (x a ,x t ,x fw ) ;getValSt (x /w ) ?> {x fv ,x fa ) ; 
splitSt (x/J ?t> (x fi ,xf 0 ,x f „x fh ); alloc (x A ,x/„) ?> (x A „xjj); 
locVal (x fl ) ?> x /t/ ;mkSt (x fi ,x fo ,x fs ,x fh ,) ?> x /a ,; 

mkValSt (xf v ,,Xf a ,) ?> x D 
H (x a ,x t ,x f J ;getValSt (x /w ) ?> (x A ,x A ); isLoc (x/J ?> x fl ; 
spiitSt (x A ) ?> {x ft ,x fo ,Xf„x fh ) ;get (x /; ,x/J ?> x / d ,; 

mkSt (x fi ,x fo ,x fs ,x fh ) ?> x /tT ,;mkValSt (x /t/ ,x A ,) ?> x D 


Add(x^ + x f2 ) := 


Eq(x ; , = x f2 ) := 


H {x a , x/,,x /w| ) ; getValSt (x /wi ) ?> (x /?j| ,x /oi ) ; islnt (x A ) ?> x /ni ; 

H {x fai , x t2 , x /w2 ) ; getValSt (x /wj ) ? > (x /t , 2 , x /aj ) ; islnt (x /dj ) ? > x /n2 ; 

add (x Ini ,x fn2 j ?t> x /n ; intVal (x /n ) ?> x /o ;mkValSt (x/^x^) ?> x D 
H (x a ,x tl ,x fwi ) ;getValSt (x /wi ) ?> (x A ,x A ) ; islnt (x A J ?> x /ni ; 

H (x/ CT| ,x f2 ,x /w2 ) ; getValSt (x /w2 ) ?> (x /dj ,x / os ) ; islnt (x/„ 2 ) ?> x/„ 2 ; 
ec l ( x frn’ x fn 2 ) ?> x /fc ;boolVal (x /t ) ?> x /tj ;mkValSt (x fv ,x fa ^ ?> x D 


Neg(-iXj) 


H [x a ,x t ,x f J ; getValSt (x/J ?> [x fv ,x ftr ) ; isBool (x/J ?> x A ; 
neg (x A ) ?>x A ,;boolVal (x A ,) ?> x A ,;mkValSt (x fv ,,x ftr ) ?>x D 


Fig. 16. Skeletal semantics for extended While (Expressions) 


corresponding value. The set filter applies if the location is in the heap, and it return the heap 
updated with the given value. The out filter always apply and adds the given value to the output 
list. The mkOK filter (resp. the mkExc filter) always applies and builds a pair of true (resp. false) and 
the given state. The isOK filter (resp. the isExc filter) applies if the Boolean is true (resp. is false), 
it then return the state component of the tuple. Other filters build or deconstruct tuples. 

Instantiation of abstract interpretation. To illustrate the flexibility of our approach, we choose a 
coarse abstraction for the in and out sorts (they are either lora single abstract value), and a precise 
abstraction of heaps: abstract heaps are modelled similar to concrete heap as a pair (n. m # ) of an 
integer and a mapping from integers to abstract values. Locations are abstracted as sets of integers. 
Tuples are abstracted as tuples of the abstraction of their components. The tuple-manipulating 
abstract filters are straightforward, so we only detail the other ones in Figure 18. 

Lemma 8.1. The abstract filters are consistent with the concrete filters. 
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Skip (skip) 
Asn(x tl := x t2 ) := 


SET(x fl <- x h ) := 


Out (out (x tl )) 

SEQ(x fl ; x t2 ) := 
Twi(tryx tl catchx t2 ) := 

Ip(i/ Xfj x t2 x f3 ) := 


[mkOK(xo-) ?> x D ] Throw( throw) := [mkExc(xo-) ?> x D ] 

H (x a ,x t2 ,x f „) ;getValSt (x /w ) ?> (x f „,x fa ); 
splitSt (x/J ?> (x fi ,x fo ,Xf s ,Xf h );write (x fl ,x /s ,x/J ?> x /s ,; 

mkSt {x f „xf 0 ,x fi ,,x fh ) ?> x /o ,;mkOK (x /tr ,) ?> x 0 

H (x ff , x fl , x /wi ) ; getValSt (x /wj ) ? > (x /oi ,x /a ) ; isLoc (x /oi ) ? > x fl 

H (x /<r ,x t2 ,x /w2 ) ;getValSt (x /w2 j ?> (*/„ 2 , */> ) ; 

splitSt (x /(r ,) ?> (x fi ,x fo ,Xf s ,x fh )- set (x /jt x A ,x /DJ ) ?> 

mkSt (x fi ,Xf o ,Xf s ,x fh ,) ?> x/^mkOK (x A „) ?> x D 
H (x ff ,x fl ,x /w ); getValSt (x/J ?> (x /t ,,x /(r ) ; 
splitSt (x/J ?> (xf t ,x fa ,xf„xf h ) ;out (x /o ,x/J ?> 

mkSt (x fi , x /o ,, x fs ,x fh ) ?> x /<r ,; mkOK (x A ,) ?> x 0 


H (x a ,x tl ,xf e ) ; I . 


isQK (x /e ) ?> x fa -,H (x fa ,x t2 ,x 0 ) 


isExc (xyj ? > x A ,; mkExc (xy CT ,) ?> x 0 J 


{*0} 


isOK (x/J ?>x /t7 ;mkOK (x/J ?>x 0 \ 


ho} 


WHiLE(wfti/ex fl x f2 ) := 


H (x a ,x h ,x fe ) , ^. sExc ^ ?> ^, H X|j>Xo ) 

H (x a ,x tl ,x f „) ; getValSt (x/J ?> (x/„,x/J; 

/isTrue(x / J;H(x /tT ,x f2 ,x 0 ) 
isBool Xf ?> Xf,; . 

7 \isFalse(x / J;H(x /a ,x f3 ,x 0 ) , { ^ }J 

H (x a ,x tl ,x f J- getValSt (x /w ) ?> (x/„,x/J ; isBool (x/J ?> x A ; 
/isTrue (x jh ) -H (x fa ,x t2 ,x fi ) ; \ 

isOK (x/J ?> x / ct ,; H (x fa „ whilex h x t2 ,x 0 ) 
isExc (x/J ?> xy o .„;mkExc (x/ ,,) ?> x D 

\isFalse (x/ 6 ) ;mkOK (x/J ?> x D 


{*<,} 


{*«} 


Fig. 17. Skeletal semantics for While 2 (Statements) 


Lemma 8.2. The abstract semantics of Extended While is correct. 

9 RELATED WORK 

Ott [Sewell et al. 2010] is a formalism for describing language semantics and type systems. Ott 
proposes a meta-language with a humanly readable syntax for writing semantic definitions as 
inference rules, and has facilities for translating these definitions into executable interpreters and 
specifications in proof assistants such as Coq and HOL. Lem [Mulligan et al. 2014] offers a core 
functional language extended with logical features from proof assistants for writing semantic 
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Y ( v *) = { L \V v cL.v ey (u*)} 

r(/ # ) = / # 


Y (( fl# > b *’ c# )) = b,c)\aey(a) Ab ey(b) Ac ey (c)} 


Y (( n,m # )) = < (n, m) 


dom(m) - dom(m *) 

Vi 6 dom(m) ,m[i] £ y ( m* [/]) 


in # (u # ) = ( v *, v # ) 

isOK # (b*,s*) = | S 

state 

if true* C b* 

otherwise 

out # (vf, v*) -v^Uv* 

isExc # ( b*,s # ) = | S# 

1 state 

if false* C b* 
otherwise 

locVal # (Z # ) = (± int , ±boolJ # ) 

isval # = l* 


get # (Z # , (n, m*)) = 1 1 m*[l] 

alloc* ((n, m # ), v*) = (n + 1, 

m* + n i—> v *) 

lei* 



:* (l *, (w, m # ), v*) = (n, m'*) where • 

[ m'* [Z] = m* [Z] U v* if ZeZ* 

1 m' # [Z] = m* [Z] otherwise 



Fig. 18. Abstract Interpretation of Extended While 


models. Ott can be used to describe static type systems but neither Ott nor Lem has been used to 
derive program analyses. 

Action Semantics [Mosses 1992] was developed by Mosses and Watt as a modular format for 
writing semantics. Turi and Plotkin [Turi and Plotkin 1997] propose a generic way of defining 
small step operational semantics, presented in a category theoretic framework. More recently, 
Churchill emphet al. [Churchill et al. 2015] addresses the issue of the reusability of operational 
semantics. Their approach is based on structures called fundamental constructs, or funcons, which 
only specify the changed parts of the state for a given construct. For instance, the funcon for if does 
not mention environments, but only the Boolean part of the value which is needed. Funcons can 
then be combined to build a programming language. There is a connection between these funcons 
and our rules as they are both meant to capture the whole behaviour of a given language construct. 
One difference is that funcons have a certain degree of sort polymorphism, menaning that e.g., 
conditional statements and conditional expressions can be treated by the same “if”-funcon. Skeletal 
semantics would treat each sort separately but would re-use the filters, so as to avoid increasing 
the proof effort required. To the extent of our knowledge, the work on funcons has been focused 
on building extendable concrete semantics, and has never been used to build an abstract semantics. 

Views [Dinsdale-Young et al. 2013] has a concrete operational semantics for control flow, but is 
parameterised on the state model and basic commands. It proposes a program logic for this language, 
which is parameterised on the actions of the basic commands. They prove a general soundness 
result stating that it suffices to check soundness for each basic command. This corresponds in our 
framework to the fact that only simple properties on filters need to be checked. Similarly, Keidel 
et al. [Keidel et al. 2018] very recently proposed to capture the similarity between a concrete an 
abstract interpreter using a shared interpreter parameterised by arrows that could be instantiated 
to concrete and abstract versions, thus reducing the proof effort needed. 

Iris [Jung et al. 2017] is a concurrent separation logic framework. It is parameterised by a small- 
step reduction relation and it proposes a logic to reason about resources. This logic is parameterised 
by a representation of resources in the form of an algebraic structure called a “camera”. Cameras 
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come with local properties about resources that users have to check. These local constraints yield 
the soundness of the Iris logic. To be used in practice, Iris requires its users to provides lemmas 
about weakest preconditions for each language construct. These lemmas are easy to find and prove 
in simple examples (such as a vanilla While), but they require a deep understanding about how 
one reasons about the considered language. Such lemmas can be much complex to express (let 
alone prove) in complex languages such as JavaScript [Gardner et al. 2012]. We believe that our 
framework guides the proof effort by local reasoning: at each step, the abstract interpretation 
naturally considers every applicable branch. 

The K framework [Ro§u and §erbanu(a 2010] proposes a formalism for writing operational 
semantics and for constructing program verifiers directly on top of the semantic definitions, as 
opposed to using an intermediate representation and/or a verification generator linked to a specific 
program logic. The semantic rules are given as rewriting rules over terms of semantic state. The K 
framework has been used to write semantic definitions of several real-world languages, including 
C, Java, andJavaScript. The program verifiers are based on matching logic [Rogu 2017], a formalism 
for reasoning about patterns and the set of terms that they match. A language-independent set of 
proof rules defines a Reachability Logic which can reason about the set of reachable states of a 
program. This has been instantiated to obtain program verifiers reasoning about data structures of 
heap-manipulating programs in C, Java, andJavaScript [§tefanescu et al. 2016]. 

The K framework has goals similar to ours: derive verifiers from operational semantics, correct 
by construction. A key difference is that the semantics of the K specification tool is complex and 
not clearly documented [Li and Gunter 2018]. In this work, we have focused on crystallising a 
general yet simple rule format. Our format enables a general definition of when a semantics is 
well-defined and provides a generic correctness theorem for the derived program verifiers that can 
be machine-checked in the Coq proof assistant. 

Schmidt initiated the abstract interpretation of big-step operational semantics [Schmidt 1995] 
by showing how to abstract derivation trees (using co-induction to harness infinite derivations) 
and derived classical data flow and control flow analyses as abstract interpretations. Other sys¬ 
tematic derivations of static analyses have taken small-step operational semantics as starting 
point. Schmidt [Schmidt 1997b] discusses the general principles for such an approach and com¬ 
pares small-step and big-step operational semantics as foundations for abstract interpretation. 
Cousot [Cousot 1999] shows how to derive static analyses for an imperative language defined by a 
compositional transition semantics using the principles of abstract interpretation. Midtgaard and 
Jensen [Midtgaard and Jensen 2008] use a similar approach for calculating control-flow analyses for 
functional languages from operational semantics in the form of abstract machines. Van Horn and 
Might [Van Horn and Might 2010, 2011] show how a series of analyses for higher-order functional 
languages can be derived from operational semantics formulated as abstract machines. The atomic 
operations of the machines are given an abstract interpretation and it is shown that the “abstract 
abstract machines” can simulate all the transitions of the concrete abstract machine. The abstract 
machines used by Van Horn and Might can be expressed in our rule format: the atomic operations 
correspond to our filters and the simulation result corresponds to our consistency result for concrete 
and abstract interpretations. The two works differ slightly in scope in that we are interested in a 
general semantic rule format and its meta-theory whereas Van Horn and Might are concerned with 
giving a systematic derivation of advanced analyses for higher-order languages with state. 

Inspired by Schmidt, Bodin et al. [Bodin et al. 2015] identify a rule format that can be systemati¬ 
cally instantiated to both concrete and abstract semantics, with a generic consistency result. Our 
work generalises their approach. Their rule format is based on a non-standard style of operational 
semantics, called pretty-big-step operational semantics [Chargueraud 2013], which cuts up standard 
big-step rules into many fine-grained rules. In our work, one skeleton describes the behaviour of 
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one language construct. Our skeletal semantics captures many forms of traditional operational 
semantics, such as the traditional big-step semantics studied in this paper. 

10 CONCLUSIONS AND FUTURE WORK 

We have introduced a new meta-language for capturing the behaviour of programming languages, 
called skeletal semantics. A skeleton provides a simple way of describing the complete behaviour of a 
language construct in a single definition. We have given a language-independent, generic definition 
of interpretation of a skeletal semantics, systematically deriving semantic judgements from the 
skeletons. We have explored four such interpretations: a well-formedness interpretation; a concrete 
interpretation; an abstract interpretation; and a constraint generator for flow-sensitive analysis. A 
key advantage of skeletal semantics is that we are able to establish general, language-independent 
consistency results, which can then be instantiated to specific programming language by proving 
simple language-dependent filter lemmas. 

In this paper, we have focused on proving the fundamental properties of skeletal semantics, using 
the simple While language and its extensions as illustrative examples. We have demonstrated that 
we can capture many language constructs including higher-order and object-oriented features. In 
future, we would like to explore how our formalism scales to real-world languages such as OCaml 
and JavaScript. For instance, the specification of JavaScript [ECMA 2018] is written in a style where 
the whole behaviour of each language construct is described in a single definition. It should be 
comparatively straightforward to provide a specification of JavaScript using a skeletal semantics. 

A distinguishing feature of skeletal semantics is that interpretations can be used to characterise 
several styles of semantics, independently of the language considered. In this paper, we have 
focused on big-step semantics. In future, we plan to capture other forms of semantics such as 
small-step operational semantics, semantics for describing concurrent, distributed and interactive 
computation, and abstract machines, using an approach similar to [Uustalu 2013]. 

We have interesting proof techniques that are worth further exploration. For example, we 
have demonstrated how to add an abstract rule for state splitting. The proof technique used to 
validate this abstract rule, namely that abstract interpretation is a greatest fixpoint, is not specific 
to state splitting. We thus want to explore other abstract rules validated by this greatest fixpoint. In 
particular, we conjecture that we can use this approach to obtain a frame rule for skeletal semantics, 
paving the way for the integration of separation logic as an abstract interpretation. It is also possible 
to generate better (more precise) constraints than those given in Section 7, as well as constraints for 
other analyses such as control flow analysis. Based on the skeletal semantics for the A calculus, we 
have reproduced the constraint generation for 0-CFA [Palsberg 1995]. We are currently studying 
how more advanced control flow analyses for other languages can be expressed in our framework. 

Finally, we have mechanised in Coq the definitions of skeletal semantics and interpretations, and 
have proved the general consistency results. We have formalised the well-formedness, concrete 
and abstract interpretations, verifying that the abstract interpretation for the While language is 
correct. We have also mechanised a skeletal semantics for the A calculus. We are currently studying 
how to leverage this Coq mechanisation to build a certificate checker for abstract analysis. 
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